Skip to content

Juan/sc 10379/list return types in the list tests output#383

Merged
johnwalz97 merged 13 commits intomainfrom
juan/sc-10379/list-return-types-in-the-list_tests-output
Jul 17, 2025
Merged

Juan/sc 10379/list return types in the list tests output#383
johnwalz97 merged 13 commits intomainfrom
juan/sc-10379/list-return-types-in-the-list_tests-output

Conversation

@johnwalz97
Copy link
Contributor

@johnwalz97 johnwalz97 commented Jun 20, 2025

Pull Request Description

What and why?

This PR adds two columns to the list test output, specifically 'has tables' and 'has figures'. These columns let users know what to expect in the test output before they run it.

How to test

To test, open a notebook and run this snippet of code:

from validmind.tests.load import list_tests

# len(list_tests(pretty=False))
list_tests()

What needs special review?

Please let me know what you think about the way in which the information is presented, specifically via flags. We could combine the information on outputs into two columns.

Dependencies, breaking changes, and deployment notes

There should be no breaking or logical changes besides the additional columns since the majority of changes are just return type annotations.

Release notes

Added columns in the list_tests() output to show what kind of artifacts each test in the Valid Mind library produces.

Checklist

  • What and why
  • Screenshots or videos (Frontend)
  • How to test
  • What needs special review
  • Dependencies, breaking changes, and deployment notes
  • Labels applied
  • PR linked to Shortcut
  • Unit tests added (Backend)
  • Tested locally
  • Documentation updated (if required)
  • Environment variable additions/changes documented (if required)

@johnwalz97 johnwalz97 added enhancement New feature or request highlight Feature to be curated in the release notes 🚧 Work In Progress and removed highlight Feature to be curated in the release notes labels Jun 20, 2025
- Added support for inspecting return types to determine if they include figures or tables.
- Introduced new type hints for Matplotlib and Plotly figures.
- Updated the test listing function to include flags for presence of figures and tables in the output.
@juanmleng
Copy link
Contributor

This looks very nice! I think outputting two flags makes the result clear and easy to interpret. However, I have a couple of comments on that:

  • Currently, when a test does not include a return type annotation, we default both flags to False. However, this could be confusing in edge cases where the test only outputs text. To make this more transparent, we could consider setting the output to "unknown" in such cases and issuing a warning to inform users that the test lacks a return type.
  • Related to the above, when a test returns a string, we don’t explicitly indicate that. While this is a bit of an edge case, it is still a valid scenario. One possible approach is to treat tests with no return type annotation as "unknown" and leave both flags as False. This would implicitly suggest that the test returns a string, although it's not very transparent.
  • Adding a third flag such as has_text might be overkill for this rare case. On the other hand, doing so would also document the "art of the possible" by acknowledging that tests can return text. That said, this is likely not common enough to strongly influence the design.
  • It would be perhaps quite useful to also include dedicated filters to list tests by has_figure and has_table.

Example:
Screenshot 2025-06-25 at 13 01 49

@johnwalz97
Copy link
Contributor Author

This looks very nice! I think outputting two flags makes the result clear and easy to interpret. However, I have a couple of comments on that:

  • Currently, when a test does not include a return type annotation, we default both flags to False. However, this could be confusing in edge cases where the test only outputs text. To make this more transparent, we could consider setting the output to "unknown" in such cases and issuing a warning to inform users that the test lacks a return type.
  • Related to the above, when a test returns a string, we don’t explicitly indicate that. While this is a bit of an edge case, it is still a valid scenario. One possible approach is to treat tests with no return type annotation as "unknown" and leave both flags as False. This would implicitly suggest that the test returns a string, although it's not very transparent.
  • Adding a third flag such as has_text might be overkill for this rare case. On the other hand, doing so would also document the "art of the possible" by acknowledging that tests can return text. That said, this is likely not common enough to strongly influence the design.
  • It would be perhaps quite useful to also include dedicated filters to list tests by has_figure and has_table.

Example: Screenshot 2025-06-25 at 13 01 49

Hmm, good points.

So for any custom test that doesn't have return type annotations, we are not going to be able to tell what the outputs are. I can clarify the warning message to make it a little bit more clear what users should do. And I can set the column to "Unknown" like you mentioned.

As far as adding a Has Text column, I think that might be overkill for now but definitely something we can keep on the back burner.

@AnilSorathiya
Copy link
Contributor

@johnwalz97 @juanmleng and @cachafla
Just wondering, how about return types of tests display in a single column where output types will be listed as values in a list form? similar to "required inputs" -> ["dataset", "model", "models", ...]
for example,
"Output types" column in a table:
[table, figure, text,...]

It will allow us to add more return types without disturbing list_test().

@johnwalz97
Copy link
Contributor Author

@johnwalz97 @juanmleng and @cachafla Just wondering, how about return types of tests display in a single column where output types will be listed as values in a list form? similar to "required inputs" -> ["dataset", "model", "models", ...] for example, "Output types" column in a table: [table, figure, text,...]

It will allow us to add more return types without disturbing list_test().

That's not a bad idea. The only thing is it does make it a bit more difficult to filter the data frame down to only tests that have tables or only tests that have figures. but it would definitely make the table a bit more compact.

The thing is right now, it's set up to only be able to tell if the test outputs a table or figure. Shouldn't be that difficult to add detection for the other types. It's just a question of if we want it. Let me know what you guys think.

@cachafla
Copy link
Contributor

@johnwalz97 how does the user filter the list of tests for "show me tests that output figures"?

@johnwalz97
Copy link
Contributor Author

@johnwalz97 how does the user filter the list of tests for "show me tests that output figures"?

Right now, they would have to programmatically filter the list that's returned if they don't use the pretty argument, or they would have to get the DataFrame from the Styler object and filter that. There isn't a parameter right now for filtering, but do you want to add that as part of this ticket?

@github-actions
Copy link
Contributor

PR Summary

This pull request introduces extensive updates across the codebase by adding explicit return type annotations and updating function signatures with appropriate type hints. The changes span multiple modules and directories including tests, ongoing monitoring, prompt validation, and unit metrics for both classification and regression.

Key enhancements include:

  • Insertion of return type annotations for functions that previously lacked them, ensuring clearer contracts between functions and improving static analysis.
  • Addition of necessary imports from the typing module (e.g. Tuple, List, Dict, Optional) and other related modules to support the type annotations.
  • Overall alignment of function signatures across various test modules (model validation, ongoing monitoring, prompt validation) and unit metrics, which helps in improving code readability and maintainability.
  • Consistent update of functions' return types such as using Tuple for multiple return values and float for scalar metrics.

These changes do not alter the existing logic or functionality. Instead, they serve to improve the code quality, ease future modifications, and facilitate better static type checking and code analysis.

Test Suggestions

  • Run the full suite of unit tests and integration tests to verify that no functionality has been altered due to the introduction of type annotations.
  • Use a static type checker (e.g., mypy) to ensure all type hints are correctly applied and there are no inconsistencies.
  • Manually review key functions in test modules for runtime behavior, particularly where complex return types (e.g., tuples with multiple types) are used.
  • Execute performance tests to confirm that the changes in annotations have no adverse effects on runtime performance.

Copy link
Contributor

@cachafla cachafla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated notebooks/how_to/explore_tests.ipynb. I suggest we merge this and address improvements later (like filtering or adding something to describe_test) since this PR makes changes to every test.

@johnwalz97 johnwalz97 merged commit fe722e9 into main Jul 17, 2025
7 checks passed
@johnwalz97 johnwalz97 deleted the juan/sc-10379/list-return-types-in-the-list_tests-output branch July 17, 2025 22:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants